Improved Topic-based Semantic Title Evaluation and Recommendation techniques and systems
نویسنده
چکیده
In this thesis, we are developing, revisiting and experimenting the previous work ”Semantic Title Evaluation and Recommendation Based on Topic Models” by Warren Jin et al [13]. As the title suggested that the system is focused on the problem of title evaluation and recommendation (or summarization, as some may say). First, the system can divided into two parts, semantic title evaluation (STE) and semantic title recommendation (STR). Also, each part can be used with different topic models, Latent Dirichlet Allocation (LDA) and Segmented Topic Model (STM). So, there can be STESTM, STELDA, STRSTM and STRLDA. STE in previous work was using cosine similarity to measure how similar a sentence to its document semantically, and then using generalized extreme value distribution fitting (GEV) to obtain a proper fit for the oddly distributed cosine similarity values, and thus to produce a p-value to statistically evaluate the quality of title [13]. However, GEV cannot produces the perfect fit every time. Hence, in the current work, a new way of STE using direct hypothesis test between sentence and document is developed based empirical data analysis and has been experimented with a series of experiments. After a experiment on STESTM and STELDA, we found that LDA topic model is not suitable for the new STE paradigm. So, the rest of experiments on STE were carried out with STM topic model only. Some level of validity is found for the new STE in a series of experiments. In comparison of results produced by new STESTM and old STESTM, a good level of positive correlation is found in between of them, which suggests the new method of STE is not so different from the old STE. Moreover, new STE with STM also performed well when the corpus only has one document, and proved to be capable of recognizing title improvements as well. A instruction of re-editing titles to improve title quality was also devised based on some by-products from the system. STR have not been thoroughly tested with various parameter settings in the previous work [13]. Three parameters are critical for the performance of STR in theory: number of topics in topic model training, parameter setting on preprocessing stage for removing words from the training vocabulary and size of the corpus. A series of experiments were done on each of the parameters. In general, the results of these experiments suggested that STR with STM topic model is better than with LDA topic model, and found that they are reacted to the variations of parameters in similar way. Except, when varying the size of
منابع مشابه
Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems
One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...
متن کاملAHP Techniques for Trust Evaluation in Semantic Web
The increasing reliance on information gathered from the web and other internet technologies raise the issue of trust. Through the development of semantic Web, One major difficulty is that, by its very nature, the semantic web is a large, uncensored system to which anyone may contribute. This raises the question of how much credence to give each resource. Each user knows the trustworthiness of ...
متن کاملAHP Techniques for Trust Evaluation in Semantic Web
The increasing reliance on information gathered from the web and other internet technologies raise the issue of trust. Through the development of semantic Web, One major difficulty is that, by its very nature, the semantic web is a large, uncensored system to which anyone may contribute. This raises the question of how much credence to give each resource. Each user knows the trustworthiness of ...
متن کاملDefining evaluation criteria for Health Information Systems using Human, organization and technology-fit factors (HOT-fit): systematic review
Introduction: The purpose of this study is to conduct a review of a series of published studies on evaluation of health information systems in order to determine the criteria of evaluation of hospital information systems using HOT-fit framework Information sources or data: The present study is a review study to evaluate articles of English databases PubMed, scupos and Persian databases Irandoc...
متن کاملAdaptive Information Analysis in Higher Education Institutes
Information integration plays an important role in academic environments since it provides a comprehensive view of education data and enables mangers to analyze and evaluate the effectiveness of education processes. However, the problem in the traditional information integration is the lack of personalization due to weak information resource or unavailability of analysis functionality. In this ...
متن کامل